% please textwrap! It helps svn not have conflicts across a multitude of
% lines.
%
% vim:set textwidth=78:

\subsection{Training on Learning Data}
\label{sec:methodology}
Similar to the first three assignments, in order to start diving into this 
large training set, we first establish a flow to
evaluate whether a decision makes a positive impact on training accurate
models. To evaluate performance, we choose precision and recall and RMSE for finding the best
parameters for logistic and linear regression models respectively.
We then set up a cross validation using stratified sampling to evaluate the models 
produced.  

We then establish the flow as explained in Section \ref{sec:flow} except that we move
normalization and replacing missing values inside cross validation. This eliminates leakgage from
test folds into training folds of cross validation.

We arrive at the final models for predicting TARGET\_B and TARGET\_D as shown in Figure \ref{fig:linear_model}
 and Figure \ref{fig:logistic_model}. We try different combinations of features
by trial and error and evaluate their performance by observing their effects on precision and 
recall for Logistic Regression and RMSE for linear regression. 



